Belfast
Understanding Synthetic Context Extension via Retrieval Heads
Zhao, Xinyu, Yin, Fangcong, Durrett, Greg
Long-context LLMs are increasingly in demand for applications such as retrieval-augmented generation. To defray the cost of pretraining LLMs over long contexts, recent work takes an approach of synthetic context extension: fine-tuning LLMs with synthetically generated long-context data in a post-training stage. However, it remains unclear how and why this synthetic context extension imparts abilities for downstream long-context tasks. In this paper, we investigate fine-tuning on synthetic data for three long-context tasks that require retrieval and reasoning. We vary the realism of "needle" concepts to be retrieved and diversity of the surrounding "haystack" context, from using LLMs to construct synthetic documents to using templated relations and creating symbolic datasets. We find that models trained on synthetic data fall short of the real data, but surprisingly, the mismatch can be interpreted and even predicted in terms of a special set of attention heads that are responsible for retrieval over long context, retrieval heads (Wu et al., 2024). The retrieval heads learned on synthetic data have high overlap with retrieval heads learned on real data, and there is a strong correlation between the recall of heads learned and the downstream performance of a model. Furthermore, with attention knockout and activation patching, we mechanistically show that retrieval heads are necessary and explain model performance, although they are not totally sufficient. Our results shed light on how to interpret synthetic data fine-tuning performance and how to approach creating better data for learning real-world capabilities over long contexts.
Everyone Has 'Car Brain'
This article was featured in One Story to Read Today, a newsletter in which our editors recommend a single must-read from The Atlantic, Monday through Friday. Francis Curzon, born in 1884 and later named the fifth Earl Howe, loved a souped-up Bugatti. And he loved to drive fast. He was famous for his "great skill and daring" on the racetrack, and also, eventually, for crashing into pedestrians--knocking down a boy in Belfast, Northern Ireland; slamming into a horse-drawn cart and killing a peasant in Pesaro, Italy. These incidents (and 10 more) were recounted in a 1947 polemic by J. S. Dean, chair of the Pedestrians' Association in England.
AWS Data Engineer at PA Consulting - Belfast, United Kingdom
We're an innovation and transformation consultancy that believes in the power of ingenuity to build a positive-human future in a technology-driven world. Our diverse teams of experts combine innovative thinking with breakthrough-technologies to progress further, faster. With a global network of FTSE 100 and Fortune 500 clients, we'll offer you unrivalled opportunities for growth and the freedom to excel. Combining strategies, technologies and innovation, we turn complexity to opportunity and deliver enduring results, enabling you to build a lasting career. Isn't it time you joined us?